On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem

نویسندگان

José Luis Menaldi

Maurice Robin

چکیده

The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping problems. Some results relative to a related problem with switching cost are obtained. Key words, variational inequality, switching problem, bandit problem, dynamic programming, index policy AMS(MOS) subject classifications. 35B37, 49A60, 49B60, 60J25, 93E20

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous Time Associative Bandit Problems

In this paper we consider an extension of the multiarmed bandit problem. In this generalized setting, the decision maker receives some side information, performs an action chosen from a finite set and then receives a reward. Unlike in the standard bandit settings, performing an action takes a random period of time. The environment is assumed to be stationary, stochastic and memoryless. The goal...

متن کامل

Finite-Time Regret Bounds for the Multiarmed Bandit Problem

We show finite-time regret bounds for the multiarmed bandit problem under the assumption that all rewards come from a bounded and fixed range. Our regret bounds after any number T of pulls are of the form a+b logT+c log2 T , where a, b, and c are positive constants not depending on T . These bounds are shown to hold for variants of the popular "-greedy and Boltzmann allocation rules, and for a ...

متن کامل

The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection

The multiarmed bandit is often used as an analogy for the tradeoff between exploration and exploitation in search problems. The classic problem involves allocating trials to the arms of a multiarmed slot machine to maximize the expected sum of rewards. We pose a new variation of the multiarmed bandit—the Max K-Armed Bandit—in which trials must be allocated among the arms to maximize the expecte...

متن کامل

Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management

Consider the Markov decision problems (MDPs) arising in the areas of intelligence, surveillance, and reconnaissance in which one selects among different targets for observation so as to track their position and classify them from noisy data [9], [10]; medicine in which one selects among different regimens to treat a patient [1]; and computer network security in which one selects different compu...

متن کامل

On the efficiency of Bayesian bandit algorithms from a frequentist point of view

In this contribution, we argue that algorithms derived from the Bayesian modelling of the multiarmed bandit problem are also optimal when evaluated using the frequentist cumulated regret as a measure of performance. We first show that the classical Gittins argument can be applied to convert the finite-horizon Bayesian multiarmed bandit problem into an MDP planning task that is numerically solva...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem

نویسندگان

چکیده

منابع مشابه

Continuous Time Associative Bandit Problems

Finite-Time Regret Bounds for the Multiarmed Bandit Problem

The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection

Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management

On the efficiency of Bayesian bandit algorithms from a frequentist point of view

عنوان ژورنال:

اشتراک گذاری